# Large language models

Upstage AI
Upstage AI leverages powerful large language models and document processing engines to transform workflows and improve efficiency for enterprises. Its main advantages include high accuracy, high performance, and customized solutions for various industries. It is positioned to empower leading enterprises and enhance work efficiency.
Document processing
37.8K

Comfygen
ComfyGen is an adaptive workflow system focused on text-to-image generation that automates and tailors effective workflows by learning from user prompts. The emergence of this technology marks a shift from using a single model to incorporating multiple specialized components in complex workflows aimed at enhancing image generation quality. A key advantage of ComfyGen is its ability to automatically adjust workflows based on user text prompts, making it especially valuable for users who need to generate images in specific styles or themes.
AI image generation
61.0K
Fresh Picks

DCLM
DataComp-LM (DCLM) is a comprehensive framework for building and training large language models (LLMs), providing standardized corpora, efficient pre-training recipes based on the open_lm framework, and over 50 evaluation methods. DCLM supports researchers in experimenting with different data set construction strategies at different computational scales, from 411M to 7B parameter models. DCLM significantly improves model performance through optimized dataset design and has already facilitated the creation of multiple high-quality datasets that outperform all open datasets at different scales.
AI Model
48.3K

Make An Audio 2
Make-An-Audio 2 is a text-to-audio generation technology based on diffusion models, co-developed by researchers from Zhejiang University, ByteDance, and the Chinese University of Hong Kong. This technology utilizes pre-trained large language models (LLMs) to parse text, optimizing for semantic alignment and temporal consistency, thereby improving the quality of generated audio. It also incorporates a feed-forward Transformer-based diffusion denoiser to enhance performance in generating variable-length audio and bolster the extraction of temporal information. Furthermore, by leveraging LLMs to convert abundant audio label data into audio-text datasets, the issue of time data scarcity is addressed.
AI Music Generation
52.4K

Brainsoup
With BrainSoup, you can create custom AI agents to handle tasks and automate workflows through natural language. Enhance AI capabilities with your data while maintaining the highest levels of privacy and security. BrainSoup supports multiple large language models and semantic core technologies, making AI agents more powerful and personalized.
Intelligent Body
61.0K

FP6 LLM
FP6-LLM is a new supporting solution for large language models. Through six-bit quantization (FP6), it effectively reduces the model size while maintaining model quality across various applications. We present TC-FPx, the first complete GPU kernel design that uniformly supports various quantization bit widths for floating-point weights. By integrating the TC-FPx kernel into existing inference systems, we provide a new end-to-end support for quantized LLM inference (called FP6-LLM), achieving a better balance between inference cost and model quality. Experiments demonstrate that FP6-LLM enables inference of LLaMA-70b using a single GPU, achieving normalized inference throughput 1.69x to 2.65x higher than the FP16 baseline.
AI Model
54.9K

Styletts 2
StyleTTS 2 is a text-to-speech (TTS) model that utilizes large speech language models (SLMs) for style diffusion and adversarial training, achieving human-level TTS synthesis. It employs a diffusion model to model style as a latent stochastic variable, generating the most appropriate style for the given text without relying on voice references. Furthermore, we utilize large pre-trained SLMs (such as WavLM) as discriminators and incorporate our innovative differentiable duration modeling for end-to-end training, enhancing the naturalness of the synthesized speech. StyleTTS 2 surpasses human recordings on the single-speaker LJSpeech dataset and matches them on the multi-speaker VCTK dataset, garnering recognition from native English-speaking evaluators. Additionally, when trained on the LibriTTS dataset, our model outperforms prior publicly available zero-shot extension models. By demonstrating the potential of style diffusion and adversarial training with large SLMs, this work achieves human-level TTS synthesis on both single and multi-speaker datasets.
AI speech synthesis
214.2K
Featured AI Tools

Flow AI
Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.
Video Production
42.8K

Nocode
NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.
Development Platform
44.7K

Listenhub
ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.
AI
42.2K

Minimax Agent
MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.
Multimodal technology
43.1K
Chinese Picks

Tencent Hunyuan Image 2.0
Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.
Image Generation
42.2K

Openmemory MCP
OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.
open source
42.8K

Fastvlm
FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.
Image Processing
41.4K
Chinese Picks

Liblibai
LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.
AI Model
6.9M